2 research outputs found
Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information
Applying people detectors to unseen data is challenging since patterns distributions, such
as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ
from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt
frame by frame people detectors during runtime classification, without requiring any additional
manually labeled ground truth apart from the offline training of the detection model. Such adaptation
make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors
estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation
discriminates between relevant instants in a video sequence, i.e., identifies the representative frames
for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration
(i.e., detection threshold) of each detector under analysis, maximizing the mutual information to
obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not
require training the detectors for each new scenario and uses standard people detector outputs, i.e.,
bounding boxes. The experimental results demonstrate that the proposed approach outperforms
state-of-the-art detectors whose optimal threshold configurations are previously determined and
fixed from offline training dataThis work has been partially supported by the Spanish government under the project TEC2014-53176-R
(HAVideo
Enhancing Multi-Camera People Detection by Online Automatic Parametrization Using Detection Transfer and Self-Correlation Maximization
Finding optimal parametrizations for people detectors is a complicated task due to the large
number of parameters and the high variability of application scenarios. In this paper, we propose a
framework to adapt and improve any detector automatically in multi-camera scenarios where people
are observed from various viewpoints. By accurately transferring detector results between camera
viewpoints and by self-correlating these transferred results, the best configuration (in this paper,
the detection threshold) for each detector-viewpoint pair is identified online without requiring any
additional manually-labeled ground truth apart from the offline training of the detection model. Such
a configuration consists of establishing the confidence detection threshold present in every people
detector, which is a critical parameter affecting detection performance. The experimental results
demonstrate that the proposed framework improves the performance of four different state-of-the-art
detectors (DPM , ACF, faster R-CNN, and YOLO9000) whose Optimal Fixed Thresholds (OFTs) have
been determined and fixed during training time using standard datasets.
Keywords: self-correlationmaximization;multi-camera; people detection; automaticThis work has been partially supported by the Spanish government under the project TEC2014-53176-